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Abstract 

We consider quantile estimation using Markov chain Monte Carlo and establish con- 
ditions under which the sampling distribution of the Monte Carlo error is approximately 
Normal. Further, we investigate techniques to estimate the associated asymptotic vari- 
ance, which enables construction of an asymptotically valid interval estimator. Finally, we 
explore the finite sample properties of these methods through examples. An R package, 
mcmcse, makes implementation of the suggested methods easy. 



1 Introduction 

Let 7r denote a probability distribution having support X C R d , d > 1. If W ~ 7r and 
g : X — > R, set V = g{W). We consider estimation of quantiles of the distribution of V. 
Specifically, if Fy denotes the cumulative distribution function of V, then our goal is to 
obtain 

:= Fy\q) = ini{v : F v (v) > q} . 

In most practically relevant statistical settings it is not possible to calculate £ q directly. For 
example, a common goal is construction of posterior credible intervals via calculation of quan- 
tiles of marginal posterior distributions. 
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Our focus is on using Markov chain Monte Carlo (MCMC) methods to approximate £ 9 . 
The basic MCMC method entails simulating a Markov chain X = {Xq, X\, . . .} having invari- 
ant distribution tt; the popularity of MCMC largely is due to the ease with which such a simu- 



lation can be accomplished (Liu, 2001 Robert and Casella, 2004). Define Y = {Yq, Y\, . . .} 



{g(Xo),g(Xi), . . .}. If we observe a realization of X of length n and let Y n u\ denote the jth 
order statistic of {Yq, . . . , then a natural estimator of t; q is given by 

L,g ■= Y n (j+i) where j < nq < j + 1 . (1) 

The estimate £ ni9 will be more valuable if we can also assess the unknown Monte Carlo error, 
£n,<j — iq- We consider doing this through its approximate sampling distribution. 

If {Xq, . . . , X n -i} is a random sample from tt and Fy has a density fy positive at Fy 1 (q), 
then a classical result is that as n — > oo 

ML, g - £g) 4 N(0,g(l - q)/[fv{i q )?) ■ 

Moreover, it is easy to estimate the variance in the asymptotic distribution and hence construct 
an asymptotically valid interval estimator for £ q . Our goal is to extend this approach to the 
setting where X is a Markov chain and consider its application in the context of MCMC. 
Unfortunately, the above conditions are no longer sufficient for a CLT; we must replace the 
assumption of X being a random sample with a mixing condition on the Markov chain. While 
we defer the statement of our theorem until Section [2j we note that the required mixing 
condition is quite weak; specifically, polynomially ergodicity of order 3 will suffice. For now, 
assume there exists a constant 7^ > such that as n — > 00 

v^(ln, g -e g )4N(0,7 g 2 ). (2) 

Note that j q must account for the serial dependence present in a non-trivial Markov chain 
and hence is a more complicated quantity than when X is a random sample. 

Suppose we can construct an estimate of 7?, say 7^, then an interval estimator of £g is 
given by 

£ ±t ^ n 



'n 

where t* is an appropriate Student's t quantile. Such intervals, or at least, the Monte Carlo 
standard error (MCSE), Jn/y/n, are useful in assessing the reliability of the simulation results 
as they explicitly describe the level of confidence we have in the reported number of significant 
figures in £ nq . In this sense reporting the MCSE allows us to have as much confidence in the 
MCMC simulation results as we would if it were possible to simulate a random sample from 
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7r. For more on this approach see Flegal et al. (2008), Flegal and Jones (2011), Geyer (2011), 



Jones et al. (2006) and Jones and Hobert (2001). 



We consider two methods for implementing this recipe. The first is based on the method 
of batch means (BM) while the second is based on regenerative simulation (RS). Batch means 
is based on the CLT in ^ and works by breaking the chain up into batches that are ap- 
proximately independent and then operating on these batches. RS is based on simulating an 
augmented Markov chain X' which allows breaking the chain into independent and identically 
distributed parts and hence requires that we establish a slightly different quantile CLT than 
that in We will show that the BM-based interval is easier to implement, but the RS-based 
interval requires slightly weaker regularity conditions. Moreover, our simulation results show 
that both produce effective interval estimates of £ q . 

The remainder is organized as follows. We begin in Section [2] by setting our notation and 
carefully stating our assumptions. We then establish a CLT for the Monte Carlo error and 
consider how to calculate Monte Carlo standard errors using BM. Next we consider RS and 
establish an alternative CLT. We go on to show how RS can be used for calculating an MCSE. 
In the last part of Section [2j we consider alternative methods such as bootstrap methods for 
time series. In Section[3]we illustrate the use of RS and BM and investigate their finite-sample 
properties in two examples. Finally, in Section [4] we summarize our results and conclude with 
a practical recommendation. 



2 Quantile estimation 

Recall that tt is a probability distribution having support X and let B(X) denote the Borel 
cr-algebra. Throughout we assume X is a Harris ergodic (that is, 7r-irreducible, aperiodic and 
positive Harris recurrent) Markov chain having invariant distribution tt. 

As we noted in the introduction a mixing condition on X is one of the sufficient conditions 
for a CLT. We require some notation before we can describe the mixing condition. For 
n £ {1,2,3,...} let the n-step Markov kernel associated with X be P n (x,dy). Then if 
A G B(X) and k € {0, 1,2,.. .}, P n (x, A) = Pr(A" fc+n G A\X k = x). Let || • || denote the total 
variation norm. Say X is polynomially ergodic of order m if there exists a real- valued function 
M : X -4 R + with E W M < oo and m > such that for all X = x 

\\P n (x,-)-Tr(-)\\<M( y x)n- m . (3) 

There is a substantial body of work concerning polynomial ergodicity of MCMC algorithms; 



see e.g. Douc et al. (2004), Fort and Moulines (2000), Fort and Moulines (2003), Jarner 
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and Roberts (2002), Jarner and Roberts (2007), Jarner and Tweedie (2003). However, in 



the MCMC literature more attention has been paid to establishing geometric ergodicity and 
uniform ergodicity, both of which are stronger than polynomial ergodicity; see, among many 



others, Hobert (2011), Jones and Hobert (2001), Johnson et al. (2011), Jones et al. (2012) 



Mengersen and Tweedie] ( [1996D , [Roberts and Tweedie| Q1996D and |Tierneyl ( |1994[ ). 

We are now in position to give our first CLT. Recall Y = {Yq, Y%, . . .} = {g(Xo), g(X\), . . .}, 
define U(y) = {U (y), U x {y), . . .} = {I(Y < y),I(Y 1 <y),.. .} and set 



a 2 (y) := Var n U (y) + 2 Cov n [U (y), U k (y)} 



(4) 



k=l 



The proof of the following result is easily obtained from Theorem 2 in Jones (2004) and 



Theorem 2 in Yoshihara (1995). 



Theorem 1. Suppose there is e > such that X is polynomially ergodic of order 2.5 + e. If 
Fy has a density fy positive and bounded in some neighborhood of £„, then as n — )• oo the 



sum in 



Q converges absolutely and if o~ 2 (^q) > 



V^(L, q - 6?) 4 N(0, o- 2 (H q )/[fv(!i q )] 2 ) n^oo. (5) 

To obtain a Monte Carlo standard error we need to estimate the variance of the asymptotic 
Normal distribution in TheoremJlJ that is 7 2 (£ g ) := c 2 (£q)/[./V(£<j)] 2 • First, we substitute for 
£ 2 and separately consider estimating fv{£,n,q) and cr 2 (£n,g)- 

Consider estimating fv{£,n,q) — a univariate density estimation problem. Kernel density 
estimation has been studied extensively in the context of stationary time-series analysis (see 



e.g. Robinson, 1983) and many existing results are applicable since the Markov chains in 
MCMC are special cases of strong mixing processes. In our examples we use kernel density 
estimators with a gaussian kernel, for which there are well known conditions guaranteeing 
consistent estimation of the density at a point; see e.g. |Kim and Lee] p005| ) and [Yu| ( fl993"l ) . 

The quantity cr 2 (y), y £ M is familiar. Notice that if U n (y) ■= n" 1 J2i=o Ui(y), then 



n 



V^(Un(y) - E n I(Y < y)) 4 N(0,a 2 (y)) 
by the usual Markov chain CLT for sample means. In this context, estimating o~ 2 (y) is a 



well-studied problem and there 


are an array of methods for doing so; see 


Geyer 


(1996), Geyer 


(2011 


), 


Flegal and Jones 


(2010 


), 


Hobert et al. ( 


2002 


), 


Jones et al. 


(2006) and r 


vlykland et al. 



(1995) among others. 

We will use the method of batch means for estimating <7 2 (£ nj <j). For BM the output is 
split into batches of equal size. Suppose we obtain n = a n b n iterations {Xq, . . . ,X n _i} and 
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for k = 0, . . . ,a n - 1 define U k {£, n ,q) ■= b n 1 Ya=q 1 Ukb n +i(£n,q) ■ The BM estimate of <r 2 (£ n 



is 



K=0 



a„—l 



(6) 



The theoretical and empirical properties of BM methods in MCMC settings have been studied 
by |Flegal et all, ( [2008] ) , [Flegal and Jones] ( |2Q10| ) , |Flegal and Jo^es] ( |20lT| ) and | Jones et"al] ( [20061 ) 
who found that b n = \n l l 2 \ worked well and this is what we will use in our examples. 

Letting fv{£,n,q) denote the density estimator of fv{£,n,q) we estimate of 7 2 (£ g ) with 



n,qj 



yielding an approximate 100(1 — a)% confidence interval for £ 3 , with z a j 2 an appropriate 
standard Normal quantile, given by 



n 



(7) 



This approach is implemented in the R package mcmcse (Flegal, 2012b) which is used to 
perform many of the computations in our examples. 



2.1 Regenerative Simulation 

Regenerative simulation (RS) provides an alternative to BM. RS is based on simulating an 
augmented Markov chain and so Theorem [l] does not apply. We derive an alternative CLT 
based on RS and consider a natural estimator of the variance in the asymptotic Normal 
distribution. 

Recall X has n-step Markov kernel P n (x,dy) and suppose the kernel satisfies a one-step 
minorization condition, that is, suppose there exists a function s : X — > [0, 1] with E^s > 
and a probability measure Q such that 

P(x, A) > s{x)Q(A) for all x £ X and A £ B . (8) 

We call s the small function and Q the small measure. In this case we can write 

P(x, dy) = s(x)Q(dy) + (1 - s(x))R(x, dy) (9) 

where R is the residual measure, given by 



R(x,dy) 



P(x,dy) - s(x)Q{dy) , . 1 

1 - s{x) K ' (io) 

Q{dy) s(x) = 1 . 
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We now have the ingredients for constructing the split chain, 

X' = {(X ,8 Q ),(X 1 ,5 1 ),(X 2 ,d 2 ),...} 
which lives on X x {0, 1}. Given X, = x, then 5i and Xi + \ are found by 



1. Simulate 5i ~ Bernoulli(s(x)) 

2. If 5i = 1, simulate Xi + \ ~ Q(-); otherwise X,i + \ ~ R(x, 



Two things are apparent from this construction. First, by ^ the marginal sequence 
{X n } has Markov transition kernel given by P. Second, the set of n for which 5 n -i = 1, 
called regeneration times, represent times at which the chain probabilistically restarts itself 
in the sense that X n ~ Q(-) doesn't depend on X n -\. 

The main practical impediment to the use of regenerative simulation would appear to be 
the means to simulate from the residual kernel R(- 



defined at (10). Interestingly, as shown 



by Mykland et al. (1995), this is essentially a non-issue, as there is an equivalent update rule 
for the split chain which does not depend on R at all. Given X^ = x, find Xk+i and 5k by 



1. Simulate X^+i ~ P(x, •) 

2. Simulate 5k ~ Bernoulli (r(Xfc, Xk+i)) where 

s{x)Q(dy) 



r(x,y) 



P(x,dy) 



(11) 



RS has received considerable attention in the case where either a Gibbs sampler or a full- 



dimensional Metropolis-Hastings sampler is employed. In particular, Mykland et al. (1995) 



give recipes for establishing minorization conditions as in (|8|), which have been implemented 



in several practically relevant statistical models; see e.g. Hobert et al. (2006); Jones et al 



(2006); Jones and Hobert (2001); Roy and Hobert (2007). 



Suppose we start X' with Xq ~ Q; one can always discard the draws preceding the first 



regeneration to guarantee this, but it is frequently easy to draw directly from Q (Hobert 



et al. 2002 Mykland et al. 1995). We will write Eq to denote expectation when the split 
chain is started with Xq ~ Q. Let = tq < t\ < t 2 < ... be the regeneration times so 



6 



that Tt+x = mm{k > n '■ 5i-i = 1}- Assume X' is run for R tours so that the simulation is 
terminated the -Rth time that a <5, = 1. Let tr be the total length of the simulation and 
N t = Tt — Tt-x be the length of the tth. tour. Recall that g : X — > R, Yi = g(A,), and 
Ui(y) = I(Yi < y). Also, define 

n-l 

S t = S t (y)= U M t = l,...,R. 

i=Tt-X 

The split chain construction ensures that the pairs (At, St) are independent and identically 
distributed. For each y S R set 

r(y) = E Q \(Si - FvivW 2 ] I [^Q(iVi)] 2 . 



Let h : M + — >■ M + satisfy liniR-^oo h (tr) /■ v /tr = 0. Set j = r^q + h (tr) and consider 
estimating £ q with ^ Ti {(j)> that is, the jth order statistic of Y%, . . . ,Y TR . The proof of the 
following CLT is given in Appendix [A] 

Theorem 2. If X is polynomially ergodic of order m>2, then EqN^ < oo and EqS% < oo 
and hence T(y) exists. If, in addition, Fy has a density fy positive and bounded in some 
neighborhood of ^ q with fy differ entiable at £ q , then as R —> oo 



R ( Y mU) - e 9 ) 4 tv(o, r /f v (£,)) . 

Notice £, TR ,q (recall ([!])) requires j such that < j — TRq < 1 and hence we have the 
following corollary. 

Corollary 1. If X is polynomially ergodic of order m > 2, Fy has a density fy positive and 
bounded in some neighborhood of ^ q with fy differ entiable at £ q , then as R —> oo 



To obtain a Monte Carlo standard error we need to estimate 7^(£ g ) := T / f v (£ q ). Just 
as above we substitute £ TR ,q for £ q and separately consider r(^ Tflig ) and fy(£, TR ,q)- Of course, 
we can handle estimating fv(i TR ,q) exactly as before, so all we need to concern ourselves with 
is estimation of ^(S, TR ,q)- 

We can recognize T(y) as the variance of an asymptotic Normal distribution. Let A denote 
the average tour length, that is, A := R^ 1 J2tLi At, and similarly define S := R^ 1 J2t=i 
Now we can estimate Fy with Fr := S/N since 

F R ( y ) = ^?=iS r (y) = J_y Ui{) (12) 
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and hence by the SLLN, as R — > oo, Fr(]j) — > Fy(y) for each fixed y. For strictly stationary 



chains, Tucker (1959) proves this convergence is uniform, i.e. a generalization of the Glivenko- 
Cantelli theorem; see also Rao (JT962) and Tweedie (1977). 

From Theorem [2] we have that EqN^ < 00 and EqS^ < 00 and hence we can easily obtain 
a central limit theorem for Fn{y). Specifically, for each y £ E, as R — > 00, 

R(F R (y)-F v (yj)Ax(p,T(y)) . 



A natural estimator of T(y) (Hobert et al. 2002 Jones et al. 2006) is given by 

1 R 



t=i 



Letting fv(£,r R ,q) denote the density estimator of fv{tr R ,q) we can estimate 7^(6?) with 



fv(£,T R ,q) 



If t R _i a /2 is an appropriate quantile from a Student's t distribution with R — 1 degrees of 
freedom a 100(1 — a)% confidence interval for £ g is 



t>r R ,q ± tR-l,a/2 



TR,qj 



R 



(13) 



2.2 Alternative Methods 

It is natural to consider the utility of bootstrap methods for estimating quantiles and the 
Monte Carlo error. Indeed, there has been a substantial amount of work on using bootstrap 
methods for stationary time-series, some of which is appropriate for use in MCMC; see e.g., 



Bertail and Clemengon 


(1993 


), 


Politis 


(2003 


)■ 



One of the simplest approaches is to use a non-overlapping block resampling scheme, 



sometimes called 'Carlstein's method' (Carlstein, 1986). The basic idea is to split the Markov 



chain, X, into a non-overlapping blocks of length b, where we suppose the algorithm is run 
for a total of n = ab iterations where a = a n and b = b n are functions of n. Then sample 
the blocks independently with replacement and with equal probability and put these blocks 
together (end to end) to form a new series. From this new series obtain an estimate of 
£q. Repeating this procedure p times results in p independent and identically distributed 
estimates. We can then appeal to classical bootstrap results to estimate the Monte Carlo 
error of £ n , g . 



S 



Non-overlapping Resample 






50 100 150 



50 100 150 200 



(a) Original AR(1) Markov chain 



(b) Resampled AR(1) chain 



Figure 1: Plots illustrating non-overlapping bootstrap resampling for time-series data using 
the AR(1) model. 



While the simplicity of this method is appealing, it does not seem to work well in MCMC 
settings. For illustration purposes, consider the first order autoregressive process 



X; 



1,2 



for i = 1, 2, 



Figure la shows a plot of {X\, X2, . ■ ■ , ^200} 



where 6j is an i.i.d. N(0,r 2 ) for i 
with p = .95 for one realization of this Markov chain. Using this data, Figure lb shows a 
single non-overlapping bootstrap replicate with 10 batches, each of length 20. We can see the 
method does not retain the dependence structure of the original Markov chain. 

In general, resampling results in sequences that are less dependent than the original. As in 
the AR(1) example, a chain with strong autocorrelation, block bootstrap resampling can result 
in unrepresentative samples. There are also examples that "can lead to catastrophically bad 



resampling approximations" (Davison and Hinkley, 1997 p. 397). Further, we found block 



resampling to be prohibitively computationally intensive. 



Another alternative is the subsampling bootstrap (Politis et al. 1999). The basic idea is 
to split X into n — b + 1 overlapping blocks of length b. We then estimate £ q over each block 
resulting in n — b + 1 estimates. These estimates can be used to estimate the asymptotic 
variance from Theorem [TJ While this approach is better in terms of computational effort 
than the moving blocks bootstrap it dwarfs the effort required by the methods of the current 
paper. Moreover, the resulting confidence intervals suffer slightly worse performance in finite 



samples. See Flegal (2012a) and Flegal and Jones (2011) for more on subsampling bootstrap 
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methods in the context of MCMC. 



3 Examples 

In this section we investigate the finite-sample performance of the confidence intervals for £ q 
defined at and (13} . While our two examples are quite different, the simulation studies 



were conducted using a common methodology. In each case we perform many independent 
replications of the MCMC sampler. Each replication was performed for a fixed number of 
regenerations, then both the BM- and RS-based confidence intervals were constructed on 
the same MCMC output. For the BM-based interval we always used b n = n 1//2 , which is 
the default setting in the mcmcse R package. In order to estimate coverage probabilities we 
require the true values of the quantiles of interest. These are available in only one of our 
examples. In the other example we estimate the truth with an independent long run of the 
MCMC sampler. The details are described in the following sections. 

3.1 Polynomial target distribution 



Jarner and Roberts (2007) studied Markov chain Monte Carlo for heavy-tailed target dis- 
tributions. A target distribution is said to be polynomial of order r if its density satisfies 
fv(x) = (l(\x\)/\x\) 1+r , where r > and I is a normalized slowly varying function — a par- 
ticular example is Student's ^distribution. We consider estimating quantiles of Student's 
i-distribution t(v) for degrees of freedom v = 3, 5, and 7; the t(v) distribution is polynomial 
of order v. We use a Metropolis random walk algorithm with jump proposals drawn from a 
N(Q, a 2 ) distribution. By Proposition 3 of Jarner and Roberts (2007), a Metropolis random 



walk for a t(v) target distribution using any proposal kernel with finite variance is polynomi- 
ally ergodic of order v/2. Thus the conditions of Theorem [l] are met if v > 5 + e, while the 
conditions of Corollary [T] are satisfied for v > 4; see the first row of Table [TJ 

We tuned the scale parameter a 2 in the proposal distribution in order to minimize auto- 
correlation in the resulting chain (second row Table [T]); the resulting acceptance rates varied 
from about 25% for i(3) with a = 6, the heaviest tailed target distribution, to about 40% 
for t(7) with a = 3. Regeneration times were identified using the retrospective method of 
|Mykland et al. ( 1995 ); see Appendix [B] for implementation details, and the bottom rows Table 



[ljfor regeneration performance statistics (mean and SD of tour lengths). For each of the 10 4 
replications and using each of ([7]) and ( fl~3| ) we computed a 95% confidence interval for £ 9 for 
q = 0.50, 0.75, 0.90, and 0.95. We then compared the observed coverage rate (percentage of 
the 10 4 intervals that indeed contain the true quantile £ 9 ) with the nominal rate of 95%. 
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Target distribution 

t(7) t(5) t(3) 

MCSE estimation BM RS 

Tuning parameter a 3.0 4.0 6.0 

Mean tour length 3.91 4.55 5.94 

SD of tour lengths 3.50 4.15 7.07 

Table 1: Metropolis random walk on t(v) target distribution with N(0,o~ 2 ) jump proposals, 



example of section 3.1. In first row of table "BM" indicates polynomial ergodicity of order 
m > 2.5 + e, guaranteeing the conditions of both Theorem^ and Corollary^ "RS" indicates 
m> 2, guaranteeing the conditions of Corollary 1. 



The results are shown in Table [2j There are at least three things to note about the results. 
First, the empirical coverage rates of BM and RS are comparable, but more closely match the 
nominal rate for RS than BM. Second, as might be expected, agreement with the nominal 
coverage rate is closer for estimation of the median than for the tail quantiles £.90 and £.95. 
Third, despite the fact that the assumptions of our theorems do not hold for the i(3) case, the 
empirical coverage rates are comparable to, even if slightly worse than, the t(7) case where 
the assumptions for both BM and RS are met. Of all three observations it should be noted 
that the situation is improved by greater simulation effort, ie, running the chain for a greater 
number of regenerations. 

Table [H shows the mean and standard deviation of the width of the 10 4 intervals for the 
three cases where both BM and RS achieved empirical coverage rates of at least 0.94. We see 
that intervals are slightly wider on average for RS, while also being less variable than for BM. 
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Estimating £ q of t(v) distribution based on Normal Metropolis RW 





3 = 


.50 


q = 


.75 


q = 


.90 


q = 


.95 


R = 500 


BM 


RS 


BM 


RS 


BM 


RS 


BM 


RS 


t(7) 


0.942 


0.953 


0.930 


0.943 


0.918 


0.927 


0.905 


0.915 


t(5) 


0.939 


0.950 


0.936 


0.945 


0.917 


0.927 


0.902 


0.909 


t(3) 


0.939 


0.948 


0.930 


0.943 


0.915 


0.927 


0.902 


0.911 




Q = 


.50 


q = 


.75 


q = 


.90 


q = 


.95 


= 2000 


BM 


RS 


BM 


RS 


BM 


RS 


BM 


RS 


t(7) 


0.949 


0.955 


0.945 


0.950 


0.937 


0.943 


0.931 


0.936 


t(5) 


0.944 


0.951 


0.941 


0.948 


0.938 


0.944 


0.935 


0.942 


t(3) 


0.944 


0.949 


0.942 


0.946 


0.939 


0.943 


0.930 


0.937 



Table 2: Empirical coverage rates for nominal 95% confidence intervals for £ g , the q-quantile 
the t(v) distribution. Based on 10 4 replications of R regenerations of a Metropolis random 
walk with jump proposals drawn from a Normal distribution. Monte Carlo standard errors are 
given by \/p(l — p)/n and fall between .002 and .004- 
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q = 0.50 and R = 500 





MCSE Method 


Target dist 


BM 


RS 


t(7) 


0.250 0.032 


0.257 0.023 


t(5) 


0.258 0.033 


0.267 0.025 


*(3) 


0.271 0.034 


0.283 0.028 



q = 0.50 and R = 2000 





MCSE Method 


Target dist 


BM 


RS 


t(7) 


0.126 0.011 


0.128 0.006 


t(5) 


0.130 0.011 


0.133 0.007 


t(3) 


0.137 0.012 


0.140 0.008 



q = 0.75 and R = 2000 





MCSE Method 


Target dist 


BM 


RS 


t(7) 


0.140 0.013 


0.143 0.009 


t(5) 


0.147 0.014 


0.149 0.009 


t(3) 


0.161 0.015 


0.164 0.011 



Table 3: Mean and standard deviation of interval width for 95% confidence intervals for 
in 10 4 replications of Normal Metropolis random walk with R regenerations. 
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3.2 Probit Regression 



van Dyk and Meng (2001) report data which is concerned with the occurrence of latent 



membranous lupus nephritis. Let yi be an indicator of the disease (1 for present), xn be the 
difference between IgG3 and IgG4 (immunoglobulin G), and Xii be IgA (immunoglobulin A) 
where i = 1, . . . , 55. Suppose 

Pr(y< = 1) = $ (A) + Pix a + /3 2 x i2 ) 



and take the prior on j3 := (/3q, /Si, /3a) to be Lebesgue measure on E . Roy and Hobert (2007) 



show that the posterior n(/3\y) is proper. Our goal is to report a median and an 80% Bayesian 
credible region for each of the three marginal distributions. Denote the qth quantile associated 

for j = 0,1,2. Then the vector of parameters to be estimated 



with the marginal for j3j as £ g 
is 



U) 



AO) AO) AO) Al) M t(2) e(2) A2) 

S.l >S.5 >?.9 >S.5 >?.9 'S.i 7?. 5 ,5.9 



We will sample from the posterior using the PX-DA algorithm of Liu and Wu (1999) 



which Roy and Hobert (2007) prove is geometrically ergodic. For a full description of this 



algorithm in the context of this example see Flegal and Jones (2010) or Roy and Hobert 



(2007) 



We now turn our attention to comparing coverage probabilities for estimating elements 
of H based on the confidence intervals at ([7]) and (13) for BM and RS, respectively. We 
calculated a precise estimate from a long simulation of the PX-DA chain and declared the 
observed quantiles to be the truth-see Table [4j Roy and Hobert ( 2007 ) implement RS for this 
example and we use their settings exactly with 25 regenerations. This procedure was repeated 
for 1000 independent replications resulting in a mean simulation effort of 3.89E5 (2400). The 
resulting coverage probabilities can be found in Table [5| Notice that for the interval at ([7]) all 
the coverage probabilities are within two MCSEs of the nominal 0.95 level. However, for RS 
only 7 of the 9 investigated settings are within two MCSEs of the nominal level. In addition, 
we can see all the results using RS are below the nominal 0.95 level. 

Table [6] gives the empirical mean and standard deviation of the half-width of both the 
BM-based and RS-based confidence intervals. Notice that the RS-based interval is always 
wider than the BM-based interval. 
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Q 


0.1 


0.5 


0.9 


A 


-5.348 (7.21E-03) 


-2.692 (4.00E-03) 


-1.150 (2.32E-03) 


ft 


3.358 (4.79E-03) 


6.294 (7.68E-03) 


11.323 (1.34E-02) 


ft 


1.649 (2.98E-03) 


3.575 (5.02E-03) 


6.884 (8.86E-03) 



Table 4: Summary for Probit Regression example of calculated "truth". These calculations 
are based on 9E6 iterations where the MCSEs are calculated using a BM procedure. 



q 


0.1 


0.5 


0.9 


ft 


BM 


0.956 


0.948 


0.945 


RS 


0.942 


0.936 


0.934 


ft 


BM 


0.948 


0.943 


0.948 


RS 


0.942 


0.936 


0.934 


ft 


BM 


0.949 


0.950 


0.950 


RS 


0.938 


0.940 


0.937 



Table 5: Summary for estimated coverage probabilities for Probit Regression example. CIs 
reported have 0.95 nominal level with standard errors equal to y/p(l — p)/1000, resulting in 
MCSEs between 6.5E-3 and 7.9E-3. 



q 


0.1 


0.5 


0.9 


ft 


BM 


0.0671 (0.007) 


0.0377 (0.004) 


0.0222 (0.002) 


RS 


0.0676 (0.015) 


0.0384 (0.008) 


0.0226 (0.005) 


A 


BM 


0.0453 (0.005) 


0.0720 (0.007) 


0.1260 (0.013) 


RS 


0.0459 (0.010) 


0.0733 (0.016) 


0.1270 (0.028) 


A 


BM 


0.0287 (0.003) 


0.0474 (0.005) 


0.0825 (0.009) 


RS 


0.0292 (0.006) 


0.0481 (0.010) 


0.0831 (0.018) 



Table 6: Summary of observed CI half-widths for Probit Regression example with observed 
standard deviation. 
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4 Discussion 



We have established quantile central limit theorems appropriate for use in MCMC settings and 
considered two methods for estimating the variance of the asymptotic Normal distributions 
resulting in the interval estimators given at ([7]) and (13). We also studied the finite-sample 
properties of these intervals in the context of two examples. 

While the regularity conditions for ([7]) are slightly stronger than those for (13), the in- 
tervals were comparable in terms of their finite sample properties. Although minorization 
conditions are often nearly trivial to establish, they are often seen as a substantial barrier 
to the implementation of RS. Indeed, it is straightforward to implement the BM-based ap- 
proach in general software-see the mcmcse R package (Flegal, 2012b )-while RS requires a 
problem-specific approach. For this reason we would recommend the interval at ([7]). 



A Proof of Theorem [2] 

We must first show that EqN? < oo and E Q S$ < oo. Note that < S t < N t for all y G R, 
and hence EqS^ < EqN^ < oo. Then the first part of Theorem [2] obtains by the following 
result. 



Lemma 1. (Jones and Latuszynski, 2012) If X is polynomially ergodic of order m > 2, then 
EnN? < 00. 



We will also require a preliminary result before proceeding with the rest of the proof; see 



Iglehart (1976) for related material. 



Lemma 2. If X is polynomially ergodic of order m > 2 and Fy has a density fy positive 
and bounded in some neighborhood of £ q , then T(y) is continuous at £ 9 . 

Proof. Denote the limit from the right and left as lim y _^ x + and lim^j.- , respectively. From 
the assumption on Fy it is clear that 



lim Fy(y) = lim Fy(y) . 



(14) 



Let Zi(y) = Si(y) - Fy(y)7Vi and note E Q [Zx{y)] = since Hobert et al. (2002) show 



E Q Si(y) = F v {y)E Q N 1 for all y £ 



(15) 
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Equations (14) and (15) yield Eq 



lim 



Si(y) 



E, 



Q 



lim^-5i(y) 



limit law and (14) result in 

Eq 



lim Zx(y) 2 


= E Q 


lim Z x {yf 






y^iq 



. The composition 



(16) 



What remains to show is that the limit of the expectation is the expectation of the limit. 
Notice that < Si (y) < N\ for all y G M and 

\Zi(y)\ = \S 1 {y)-F v {y)N 1 \ < S l (y) + N l < 2N h 

which implies E Q [Z^yf] < 4E Q Nf. Since X is polynomially ergodic of order m > 2 we 
have that EqN 2 < oo by Lemma [l] and, the dominated convergence theorem gives, for any 
finite x, 



lim E Q [Z x {y) 2 ] =E Q 



lim Z x (yf 



Finally, from the above fact and (16) we have lim j/ _ !> ^+ Eq [Zi(y) 2 ] = lim^^- Eq [Zi(y) 2 ], 
and hence Eq [Zi(y) 2 ] is continuous at £ ? implying the desired result. □ 

We now return to the proof of Theorem [2j Notice 

p (v^ {Y TR{ j) -Z q )<y)=P (y trU) < e, + y/VR) 

= P^I{Y k <S g + y/VR}>?J 
( TR — 

= p E i 1 ^ ^ z« + y/^i - F ^ + y/VS) 

\fc=l 

>j- t r F v (t q + y/VR 

TR \ 



where 



and 



Wfl, fc = J{y fc < ^ + y/VR} - F v (£ q + y/y/Bj , i = 1, . . . , tr, 
fR 



sr = — (j - t r F v \ £ q + y/VR 
First, consider the sr sequence. A Taylor series expansion yields 

,2 



F V + y/VR) = F V (£,) + -j=f v (&) + f^/y (0 



(17) 
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where ( is between £ q and £ q + y / V R. The definition of j and (17) result in 



SR = J - TrF v - -j=fv (€g) - -7T^-/V (0 

TR V V-R 2jK 

= — I TR? + h (tr) - T R q - -j=TV - -fiffv (0 



ft, (tr) Vg j/ 2 ,, 

-y/y (C?) H — -ti=rJv (0 



-yfv (&) + 



/y(C) 



(18) 



and hence lim^oo sr = —yfv by assumptions on Fy, properties of h (tr) and the fact 
that N -> #(iV r ) a.s. where 1 < E(N r ) < oo. 

Second, consider Wr & 



R 



TR 



TR 



r^ g + y/VR)] 



k=l 



Lemma [2] and the continuous mapping theorem imply 

m 

ll/2 

fc=l 



^£%An(o,i). 



Using (18), (19), and Slutsky's Theorem, we can conclude as n — )• oo 



T K [rfe)] 



SR 



fc=l 



[rfe)] 1/2 , 



i - $ 



[r(£ 9 )] 1/2 



3//y fe) 
[r(^)] 1/2 



resulting in 



(19) 



B Regenerative simulation in example of section 3.1 



The minorization condition necessary for RS is, at least in principle, quite straightforward for 
a Metropolis-Hastings algorithm. Let q(x, y) denote the proposal kernel density, and a(x, y) 
the acceptance probability. Then P(x, dy) > q(x, y)a(x, y)dy, since the right hand side only 
accounts for accepted jump proposals, and the minorization condition is established by finding 



18 



s' and v' such that q(x,y)a(x,y) > s'{x)v'[y). By Theorem 2 of Mykland et al. (1995), the 
probability of regeneration on an accepted jump from x to y is then given by 

s'(x)v'(y) 

r A {x,y) - 



q(x,y)a(x,y) ' 

Letting tt denote the (possibly unnormalized) target density, we have for a Metropolis random 
walk 

i \ ■ ]*(v) 1 1 ^ • / c il • l*(v) i 

a(x, y) = mm < . . , 1 > > mm < . . , 1 > mm < , 1 

for any positive constant c. Further, for any point x and any set -D we have 

q(x,y)> inf j q{x,y)I D (y) . 

yeD [q{x,y) ) 

Together, these inequalities suggest one possible choice of s' and v 1 ', which results in 

inf yg j,{g(x,;/)/g(x,y)} min |c/tt(x), 1} min {ir(y)/c, 1} 
q{x,y)/q{x,y) mm{7r(y)/vr(x), 1} 

For a i(t>) target distribution, a(x, y) reduces to 



min i I — — — ft | , 1 > > min < ( — — — I , 1 )■ x min 



v + y J \ \ c I \ \v -\- y 



and the last component of (20) is given, up to the constant c, by 



min{t! + x 2 ,c} v + y 



2 



v + l 
2 



min {v + x 2 , v + y 2 } max + y 2 , c} 

Since this piece of the acceptance probability takes the value 1 whenever v + x 2 < c < v + y 2 
or v + y 2 < c < v + x 2 , it makes sense to take c equal to the median value of v + X 2 under 
the target distribution. 



The choice of x and D, and the functional form of the middle component of (20), will of 
course depend on the proposal distribution. For the Metropolis random walk with Normally 
distributed jump proposals, q(x, y) oc exp { — ^z(y — x) 2 }, taking D = [x — d, x + d] for d > 
gives 

infgj, M* ,)Mi v)} = exp J 1 _ _ _ 
Q{x,y)/q{x,y) { a z 

For the distributions we can take x = in all cases, but the choice of d should depend 
on v. With the goal of maximizing regeneration frequency, we arrived at, by trial and error, 
d = 2-sJv/{v — 2), or two standard deviations in the target distribution. 
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